Online Automatic Post-Editing across Domains

نویسندگان

  • Rajen Chatterjee
  • Gebremedhen Gebremelak
  • Matteo Negri
  • Marco Turchi
چکیده

English. Recent advances in automatic post-editing (APE) have shown that it is possible to automatically correct systematic errors made by machine translation systems. However, most of the current APE techniques have only been tested in controlled batch environments, where training and test data are sampled from the same distribution and the training set is fully available. In this paper, we propose an online APE system based on an instance selection mechanism that is able to efficiently work with a stream of data points belonging to different domains. Our results on a mix of two datasets show that our system is able to: i) outperform stateof-the-art online APE solutions and ii) significantly improve the quality of rough MT output. Italiano. Recenti miglioramenti dei sistemi automatici di post-editing hanno dimostrato la loro capacità di correggere errori ricorrenti commessi dalla traduzione automatica. Spesso, tuttavia, tali sistemi sono stati valutati in condizioni controllate dove i dati di training/test sono selezionati dalla stessa distribuzione e l’insieme di training è interamente disponibile. Questo articolo propone un sistema di post-editing online, basato su tecniche di selezione dei dati, capace di trattare sequenze di dati appartenenti a diversi dominii. I risultati su un insieme di dati misti mostrano che il sistema è in grado di ottenere risultati migliori rispetto i) allo stato dell’arte e ii) al sistema di traduzione.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Automatic Post-editing for MT in a Multi-Domain Translation Environment

Automatic post-editing (APE) for machine translation (MT) aims to fix recurrent errors made by the MT decoder by learning from correction examples. In controlled evaluation scenarios, the representativeness of the training set with respect to the test data is a key factor to achieve good performance. Real-life scenarios, however, do not guarantee such favorable learning conditions. Ideally, to ...

متن کامل

Rule-based Automatic Post-processing of SMT Output to Reduce Human Post-editing Effort

To enhance sharing of knowledge across the language barrier, the ACCEPT project focuses on improving machine translation of user-generated content by investigating preand postediting strategies. Within this context, we have developed automatic monolingual post-editing rules for French, aimed at correcting frequent errors automatically. The rules were developed using the Acrolinx IQ technology, ...

متن کامل

Machine Translation for Human Translators

While machine translation is sometimes sufficient for conveying information across language barriers, many scenarios still require precise human-quality translation that MT is currently unable to deliver. Governments and international organizations such as the United Nations require accurate translations of content dealing with complex geopolitical issues. Community-driven projects such as Wiki...

متن کامل

CATaLog Online: A Web-based CAT Tool for Distributed Translation with Data Capture for APE and Translation Process Research

We present a free web-based CAT tool called CATaLog Online which provides a novel and userfriendly online CAT environment for post-editors/translators. The goal is to support distributed translation where teams of translators work simultaneously on different sections of the same text, reduce post-editing time and effort, improve the post-editing experience and capture data for incremental MT/AP...

متن کامل

Correlation between Automatic Evaluation Metric Scores, Post-Editing Speed, and Some Other Factors

This paper summarises the results of a pilot project conducted to investigate the correlation between automatic evaluation metric scores and post-editing speed on a segment by segment basis. Firstly, the results from the comparison of various automatic metrics and post-editing speed will be reported. Secondly, further analysis is carried out by taking into consideration other relevant variables...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016